Goto

Collaborating Authors

 response 2





e61eaa38aed621dd776d0e67cfeee366-AuthorFeedback.pdf

Neural Information Processing Systems

This relationship is obvious if the transition and reward factorizations are the same, namely X[Ii] = X[Ji]5 for all i [m], in which case the FMDP has m independent components. The remarkable aspect here is that such6 a relationship holds, even if the transition and reward factorizations differ arbitrarily. To summarize the insight, in the long run, different growth rates of the counters reflect different importance of the23 components towards maximizing cumulative rewards, and early on, their growth can suffer large variance. Intuition: please see our Response 2.1 for an intuitive explanation regarding why we need the37 cross-component bonuses. Moreover, these cross-component bonuses offer new insight (see our Response 2.1).


HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Wang, Zhilin, Zeng, Jiaqi, Delalleau, Olivier, Shin, Hoo-Chang, Soares, Felipe, Bukharin, Alexander, Evans, Ellie, Dong, Yi, Kuchaiev, Oleksii

arXiv.org Artificial Intelligence

Preference datasets are essential for training general-domain, instruction-following language models with Reinforcement Learning from Human Feedback (RLHF). Each subsequent data release raises expectations for future data collection, meaning there is a constant need to advance the quality and diversity of openly available preference data. To address this need, we introduce HelpSteer3-Preference, a permissively licensed (CC-BY-4.0), high-quality, human-annotated preference dataset comprising of over 40,000 samples. These samples span diverse real-world applications of large language models (LLMs), including tasks relating to STEM, coding and multilingual scenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) that achieve top performance on RM-Bench (82.4%) and JudgeBench (73.7%). This represents a substantial improvement (~10% absolute) over the previously best-reported results from existing RMs. We demonstrate HelpSteer3-Preference can also be applied to train Generative RMs and how policy models can be aligned with RLHF using our RMs. Dataset (CC-BY-4.0): https://huggingface.co/datasets/nvidia/HelpSteer3#preference Models (NVIDIA Open Model): https://huggingface.co/collections/nvidia/reward-models-68377c5955575f71fcc7a2a3


Adaptive Generation of Bias-Eliciting Questions for LLMs

Staab, Robin, Dekoninck, Jasper, Baader, Maximilian, Vechev, Martin

arXiv.org Artificial Intelligence

Large language models (LLMs) are now widely deployed in user-facing applications, reaching hundreds of millions worldwide. As they become integrated into everyday tasks, growing reliance on their outputs raises significant concerns. In particular, users may unknowingly be exposed to model-inherent biases that systematically disadvantage or stereotype certain groups. However, existing bias benchmarks continue to rely on templated prompts or restrictive multiple-choice questions that are suggestive, simplistic, and fail to capture the complexity of real-world user interactions. In this work, we address this gap by introducing a counterfactual bias evaluation framework that automatically generates realistic, open-ended questions over sensitive attributes such as sex, race, or religion. By iteratively mutating and selecting bias-inducing questions, our approach systematically explores areas where models are most susceptible to biased behavior. Beyond detecting harmful biases, we also capture distinct response dimensions that are increasingly relevant in user interactions, such as asymmetric refusals and explicit acknowledgment of bias. Leveraging our framework, we construct CAB, a human-verified benchmark spanning diverse topics, designed to enable cross-model comparisons. Using CAB, we analyze a range of LLMs across multiple bias dimensions, revealing nuanced insights into how different models manifest bias. For instance, while GPT-5 outperforms other models, it nonetheless exhibits persistent biases in specific scenarios. These findings underscore the need for continual improvements to ensure fair model behavior.


incorporating the reviewers ' suggestions. 2 Response to Reviewer # 1 3 Comment 1: " The significance of the proposed method is not very clear "

Neural Information Processing Systems

We greatly appreciate the reviewers' effort and helpful comments. Comment 1: "The significance of the proposed method is not very clear..." It also has great theoretical significance in the optimization area. Though the convergence rate of this method could be suboptimal, it's a practical way to In addition, [6] shows some examples of saddle point algorithms where projection onto the constrain sets is hard. Comment 2: "Why do we consider nuclear norm constraint for this classification problem?" We find that this paper does not have section 5.4 and 5.6.


To Reviewer # 1

Neural Information Processing Systems

We thank all the reviewers for their constructive feedback. Below we provide specific responses to each reviewer. We will add more results in the paper. In the following response 2, we further highlight our important improvements ignored by existing work. The Method In Fig.1(e), Tables 4 and 5, S-GWL can be slightly worse than GWL on node correctness.



provide two responses to the common concerns raised by the reviewers, and then reply each reviewer, respectively

Neural Information Processing Systems

We would like to thank all the reviewers for your helpful comments and suggestions. As shown in Appendix A.3, the layer-wise GCN network has the highest computational complexity in the computational propagation flow. Please see the response in Common Response 2 . For fair comparison we only report the result on semi-supervised task. Please see the response in Common Response 2 .